

<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" />
  
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  
  <title>RADOS 客户端协议 &mdash; Ceph Documentation</title>
  

  
  <link rel="stylesheet" href="../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/graphviz.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/css/custom.css" type="text/css" />

  
  
    <link rel="shortcut icon" href="../../_static/favicon.ico"/>
  

  
  

  

  
  <!--[if lt IE 9]>
    <script src="../../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
    
      <script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
        <script src="../../_static/jquery.js"></script>
        <script src="../../_static/underscore.js"></script>
        <script src="../../_static/doctools.js"></script>
    
    <script type="text/javascript" src="../../_static/js/theme.js"></script>

    
    <link rel="index" title="Index" href="../../genindex/" />
    <link rel="search" title="Search" href="../../search/" />
    <link rel="next" title="RBD 增量备份" href="../rbd-diff/" />
    <link rel="prev" title="开发者指南（快速）" href="../quick_guide/" /> 
</head>

<body class="wy-body-for-nav">

   
  <header class="top-bar">
    

















<div role="navigation" aria-label="breadcrumbs navigation">

  <ul class="wy-breadcrumbs">
    
      <li><a href="../../" class="icon icon-home"></a> &raquo;</li>
        
          <li><a href="../internals/">Ceph 内幕</a> &raquo;</li>
        
      <li>RADOS 客户端协议</li>
    
    
      <li class="wy-breadcrumbs-aside">
        
          
            <a href="../../_sources/dev/rados-client-protocol.rst.txt" rel="nofollow"> View page source</a>
          
        
      </li>
    
  </ul>

  
  <hr/>
</div>
  </header>
  <div class="wy-grid-for-nav">
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search"  style="background: #eee" >
          

          
            <a href="../../">
          

          
            
            <img src="../../_static/logo.png" class="logo" alt="Logo"/>
          
          </a>

          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../../search/" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../start/intro/">Ceph 简介</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../install/">安装 Ceph</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../cephadm/">Cephadm</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../rados/">Ceph 存储集群</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../cephfs/">Ceph 文件系统</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../rbd/">Ceph 块设备</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../radosgw/">Ceph 对象网关</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../mgr/">Ceph 管理器守护进程</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../mgr/dashboard/">Ceph 仪表盘</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../api/">API 文档</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../architecture/">体系结构</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developer_guide/">开发者指南</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../internals/">Ceph 内幕</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../blkin/">Tracing Ceph With LTTng</a></li>
<li class="toctree-l2"><a class="reference internal" href="../blkin/#tracing-ceph-with-blkin">Tracing Ceph With Blkin</a></li>
<li class="toctree-l2"><a class="reference internal" href="../bluestore/">BlueStore Internals</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cache-pool/">Cache pool</a></li>
<li class="toctree-l2"><a class="reference internal" href="../ceph_krb_auth/">如何配置好 Ceph Kerberos 认证的详细文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cephfs-mirroring/">CephFS Mirroring</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cephfs-reclaim/">CephFS Reclaim Interface</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cephfs-snapshots/">CephFS 快照</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cephx/">Cephx</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cephx_protocol/">Cephx 认证协议详细阐述</a></li>
<li class="toctree-l2"><a class="reference internal" href="../config/">配置管理系统</a></li>
<li class="toctree-l2"><a class="reference internal" href="../config-key/">config-key layout</a></li>
<li class="toctree-l2"><a class="reference internal" href="../context/">CephContext</a></li>
<li class="toctree-l2"><a class="reference internal" href="../continuous-integration/">Continuous Integration Architecture</a></li>
<li class="toctree-l2"><a class="reference internal" href="../corpus/">资料库结构</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cpu-profiler/">Oprofile 的安装</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cxx/">C++17 and libstdc++ ABI</a></li>
<li class="toctree-l2"><a class="reference internal" href="../deduplication/">去重</a></li>
<li class="toctree-l2"><a class="reference internal" href="../delayed-delete/">CephFS delayed deletion</a></li>
<li class="toctree-l2"><a class="reference internal" href="../dev_cluster_deployement/">开发集群的部署</a></li>
<li class="toctree-l2"><a class="reference internal" href="../dev_cluster_deployement/#id5">在同一机器上部署多套开发集群</a></li>
<li class="toctree-l2"><a class="reference internal" href="../development-workflow/">开发流程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../documenting/">为 Ceph 写作文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../encoding/">序列化（编码、解码）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../erasure-coded-pool/">纠删码存储池</a></li>
<li class="toctree-l2"><a class="reference internal" href="../file-striping/">File striping</a></li>
<li class="toctree-l2"><a class="reference internal" href="../freebsd/">FreeBSD Implementation details</a></li>
<li class="toctree-l2"><a class="reference internal" href="../generatedocs/">Ceph 文档的构建</a></li>
<li class="toctree-l2"><a class="reference internal" href="../health-reports/">Health Reports</a></li>
<li class="toctree-l2"><a class="reference internal" href="../iana/">IANA 号</a></li>
<li class="toctree-l2"><a class="reference internal" href="../kubernetes/">Hacking on Ceph in Kubernetes with Rook</a></li>
<li class="toctree-l2"><a class="reference internal" href="../libs/">库体系结构</a></li>
<li class="toctree-l2"><a class="reference internal" href="../logging/">集群日志的用法</a></li>
<li class="toctree-l2"><a class="reference internal" href="../logs/">调试日志</a></li>
<li class="toctree-l2"><a class="reference internal" href="../macos/">在 MacOS 上构建</a></li>
<li class="toctree-l2"><a class="reference internal" href="../messenger/">Messenger notes</a></li>
<li class="toctree-l2"><a class="reference internal" href="../mon-bootstrap/">Monitor bootstrap</a></li>
<li class="toctree-l2"><a class="reference internal" href="../mon-elections/">Monitor Elections</a></li>
<li class="toctree-l2"><a class="reference internal" href="../mon-on-disk-formats/">ON-DISK FORMAT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../mon-osdmap-prune/">FULL OSDMAP VERSION PRUNING</a></li>
<li class="toctree-l2"><a class="reference internal" href="../msgr2/">msgr2 协议（ msgr2.0 和 msgr2.1 ）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../network-encoding/">Network Encoding</a></li>
<li class="toctree-l2"><a class="reference internal" href="../network-protocol/">网络协议</a></li>
<li class="toctree-l2"><a class="reference internal" href="../object-store/">对象存储架构概述</a></li>
<li class="toctree-l2"><a class="reference internal" href="../osd-class-path/">OSD class path issues</a></li>
<li class="toctree-l2"><a class="reference internal" href="../peering/">互联</a></li>
<li class="toctree-l2"><a class="reference internal" href="../perf/">Using perf</a></li>
<li class="toctree-l2"><a class="reference internal" href="../perf_counters/">性能计数器</a></li>
<li class="toctree-l2"><a class="reference internal" href="../perf_histograms/">Perf histograms</a></li>
<li class="toctree-l2"><a class="reference internal" href="../placement-group/">PG （归置组）说明</a></li>
<li class="toctree-l2"><a class="reference internal" href="../quick_guide/">开发者指南（快速）</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">RADOS 客户端协议</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#basics">Basics</a></li>
<li class="toctree-l3"><a class="reference internal" href="#resends">Resends</a></li>
<li class="toctree-l3"><a class="reference internal" href="#backoff">Backoff</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../rbd-diff/">RBD 增量备份</a></li>
<li class="toctree-l2"><a class="reference internal" href="../rbd-export/">RBD Export &amp; Import</a></li>
<li class="toctree-l2"><a class="reference internal" href="../rbd-layering/">RBD Layering</a></li>
<li class="toctree-l2"><a class="reference internal" href="../release-checklists/">Release checklists</a></li>
<li class="toctree-l2"><a class="reference internal" href="../release-process/">Ceph Release Process</a></li>
<li class="toctree-l2"><a class="reference internal" href="../seastore/">SeaStore</a></li>
<li class="toctree-l2"><a class="reference internal" href="../sepia/">Sepia 社区测试实验室</a></li>
<li class="toctree-l2"><a class="reference internal" href="../session_authentication/">Session Authentication for the Cephx Protocol</a></li>
<li class="toctree-l2"><a class="reference internal" href="../testing/">测试笔记</a></li>
<li class="toctree-l2"><a class="reference internal" href="../versions/">Public OSD Version</a></li>
<li class="toctree-l2"><a class="reference internal" href="../vstart-ganesha/">NFS CephFS-RGW Developer Guide</a></li>
<li class="toctree-l2"><a class="reference internal" href="../wireshark/">Wireshark Dissector</a></li>
<li class="toctree-l2"><a class="reference internal" href="../zoned-storage/">Zoned Storage Support</a></li>
<li class="toctree-l2"><a class="reference internal" href="../osd_internals/">OSD 开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../mds_internals/">MDS 开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../radosgw/">RADOS 网关开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../ceph-volume/">ceph-volume 开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../crimson/">Crimson developer documentation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../governance/">项目管理</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../foundation/">Ceph 基金会</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../ceph-volume/">ceph-volume</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../releases/general/">Ceph 版本（总目录）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../releases/">Ceph 版本（索引）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../security/">Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../glossary/">Ceph 术语</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../jaegertracing/">Tracing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../translation_cn/">中文版翻译资源</a></li>
</ul>

            
          
        </div>
        
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../../">Ceph</a>
        
      </nav>


      <div class="wy-nav-content">
        
        <div class="rst-content">
        
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
<div id="dev-warning" class="admonition note">
  <p class="first admonition-title">Notice</p>
  <p class="last">This document is for a development version of Ceph.</p>
</div>
  <div id="docubetter" align="right" style="padding: 5px; font-weight: bold;">
    <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
  </div>

  
  <div class="section" id="rados">
<span id="rados-client-protocol"></span><h1>RADOS 客户端协议<a class="headerlink" href="#rados" title="Permalink to this headline">¶</a></h1>
<p>This is very incomplete, but one must start somewhere.</p>
<div class="section" id="basics">
<h2>Basics<a class="headerlink" href="#basics" title="Permalink to this headline">¶</a></h2>
<p>Requests are MOSDOp messages.  Replies are MOSDOpReply messages.</p>
<p>An object request is targetted at an hobject_t, which includes a pool,
hash value, object name, placement key (usually empty), and snapid.</p>
<p>The hash value is a 32-bit hash value, normally generated by hashing
the object name.  The hobject_t can be arbitrarily constructed,
though, with any hash value and name.  Note that in the MOSDOp these
components are spread across several fields and not logically
assembled in an actual hobject_t member (mainly historical reasons).</p>
<p>A request can also target a PG.  In this case, the <em>ps</em> value matches
a specific PG, the object name is empty, and (hopefully) the ops in
the request are PG ops.</p>
<p>Either way, the request ultimately targets a PG, either by using the
explicit pgid or by folding the hash value onto the current number of
pgs in the pool.  The client sends the request to the primary for the
assocated PG.</p>
<p>Each request is assigned a unique tid.</p>
</div>
<div class="section" id="resends">
<h2>Resends<a class="headerlink" href="#resends" title="Permalink to this headline">¶</a></h2>
<p>If there is a connection drop, the client will resend any outstanding
requets.</p>
<p>Any time there is a PG mapping change such that the primary changes,
the client is responsible for resending the request.  Note that
although there may be an interval change from the OSD’s perspective
(triggering PG peering), if the primary doesn’t change then the client
need not resend.</p>
<p>There are a few exceptions to this rule:</p>
<blockquote>
<div><ul class="simple">
<li><p>There is a last_force_op_resend field in the pg_pool_t in the
OSDMap.  If this changes, then the clients are forced to resend any
outstanding requests. (This happens when tiering is adjusted, for
example.)</p></li>
<li><p>Some requests are such that they are resent on <em>any</em> PG interval
change, as defined by pg_interval_t’s is_new_interval() (the same
criteria used by peering in the OSD).</p></li>
<li><p>If the PAUSE OSDMap flag is set and unset.</p></li>
</ul>
</div></blockquote>
<p>Each time a request is sent to the OSD the <em>attempt</em> field is incremented. The
first time it is 0, the next 1, etc.</p>
</div>
<div class="section" id="backoff">
<h2>Backoff<a class="headerlink" href="#backoff" title="Permalink to this headline">¶</a></h2>
<p>Ordinarily the OSD will simply queue any requests it can’t immeidately
process in memory until such time as it can.  This can become
problematic because the OSD limits the total amount of RAM consumed by
incoming messages: if either of the thresholds for the number of
messages or the number of bytes is reached, new messages will not be
read off the network socket, causing backpressure through the network.</p>
<p>In some cases, though, the OSD knows or expects that a PG or object
will be unavailable for some time and does not want to consume memory
by queuing requests.  In these cases it can send a MOSDBackoff message
to the client.</p>
<p>A backoff request has four properties:</p>
<ol class="arabic simple">
<li><p>the op code (block, unblock, or ack-block)</p></li>
<li><p><em>id</em>, a unique id assigned within this session</p></li>
<li><p>hobject_t begin</p></li>
<li><p>hobject_t end</p></li>
</ol>
<p>There are two types of backoff: a <em>PG</em> backoff will plug all requests
targetting an entire PG at the client, as described by a range of the
hash/hobject_t space [begin,end), while an <em>object</em> backoff will plug
all requests targetting a single object (begin == end).</p>
<p>When the client receives a <em>block</em> backoff message, it is now
responsible for <em>not</em> sending any requests for hobject_ts described by
the backoff.  The backoff remains in effect until the backoff is
cleared (via an ‘unblock’ message) or the OSD session is closed.  A
<em>ack_block</em> message is sent back to the OSD immediately to acknowledge
receipt of the backoff.</p>
<p>When an unblock is
received, it will reference a specific id that the client previous had
blocked.  However, the range described by the unblock may be smaller
than the original range, as the PG may have split on the OSD.  The unblock
should <em>only</em> unblock the range specified in the unblock message.  Any requests
that fall within the unblock request range are reexamined and, if no other
installed backoff applies, resent.</p>
<p>On the OSD, Backoffs are also tracked across ranges of the hash space, and
exist in three states:</p>
<ol class="arabic simple">
<li><p>new</p></li>
<li><p>acked</p></li>
<li><p>deleting</p></li>
</ol>
<p>A newly installed backoff is set to <em>new</em> and a message is sent to the
client.  When the <em>ack-block</em> message is received it is changed to the
<em>acked</em> state.  The OSD may process other messages from the client that
are covered by the backoff in the <em>new</em> state, but once the backoff is
<em>acked</em> it should never see a blocked request unless there is a bug.</p>
<p>If the OSD wants to a remove a backoff in the <em>acked</em> state it can
simply remove it and notify the client.  If the backoff is in the
<em>new</em> state it must move it to the <em>deleting</em> state and continue to
use it to discard client requests until the <em>ack-block</em> message is
received, at which point it can finally be removed.  This is necessary to
preserve the order of operations processed by the OSD.</p>
</div>
</div>



           </div>
           
          </div>
          <footer>
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
        <a href="../rbd-diff/" class="btn btn-neutral float-right" title="RBD 增量备份" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
        <a href="../quick_guide/" class="btn btn-neutral float-left" title="开发者指南（快速）" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>
        &#169; Copyright 2016, Ceph authors and contributors. Licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0).

    </p>
  </div> 

</footer>
        </div>
      </div>

    </section>

  </div>
  

  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>

  
  
    
   

</body>
</html>